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Abstract — Flow-based traffic measurement is a very challeng- 
ing problem: Managing counters for each individual traffic flow 
in hardware resources knowingly struggle to scale with high- 
speed links. In this paper we propose a novel lattice theory-based 
approach that improves flow-based measurement performances 
and scales by keeping the number of the maintained hardware 
counters to a minimum (result mathematically established in 
the paper). The crucial contribution of the lattice is to map 
the computational semantics of the packet processing to user 
requests for traffic measurement thus allowing for a better- 
informed and focused counter assignment. An implementation 
over an Openflow switch, FlowME, was developed and evaluated 
upon its memory usage, performance overhead, and processing 
effort to generate the minimal solution. Experimental results 
indicate a significant decrease in resource consumption. 

I. Introduction 

Network traffic measurement is an essential activity that 
allows network managers to get the visibility required for daily 
operations and network evolution planning. Tools to observe 
per-flow traffic must scale with a wide spectrum of applica- 
tions, flows and queries while maintaining the performance of 
the underlying hardware, achieving accurate traffic measure- 
ments and operating at wire speed [1]. Conventional solutions 
like NetFlow sample traffic and send per-flow statistics to a 
remote server to exploit in user applications, thus incurring 
inaccurate statistics and intensive resource and network band- 
width usage. Recent works ||2), J3) in application-aware traffic 
measurement use also prior knowledge about users require- 
ments, i.e. user queries, to achieve adaptive measurements 
but they require dedicated packet classification mechanisms 
to carry out the measurement task. 

A significant drawback of current solutions is they largely 
ignore the computational structure of the packet processing 
and the induced query-to-flow associations. We regard these 
associations as crucial and believe that only a structure that 
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Fig. 1. Architecture Overview 



correctly expresses them has the richness and flexibility to 
support the search for an optimal counter assignment. Thus, we 
turn to concept lattices and formal concept analysis (FCA) |4j. 

Our lattice-based traffic measurement method, FlowME, 
enables fine-grain querying of the network traffic flows and 
extracting of the query -bound flow measurements. Figure [T] il- 
lustrates its main components: As a main supporting structure, 
a hierarchy of high-level, flows-to-matchfields abstractions, the 
concepts, is constructed (a) and each query is mapped to a 
unique node thereof, its target concept (comprising the answer 
set of flows). The hierarchy, or the concept lattice, factors 
out commonalities in the answer sets by turning them into 
concepts. Targets induce a sub-hierarchy where each node - 
called projection - corresponds to an intersection of answer 
sets. In order to avoid the redundancies in the resulting family 
of sets, each flow is mapped to a minimal projection, its ground 
(b). We show that by assigning a counter to every projection, 
a system of counters is obtained which is both minimal in size 
and allows all the queries to get a precise answer (c). As a 
result, in FlowME the hardware counters are kept for disjoint 
sets of flow entries (instead of passively monitoring all flows). 
Moreover, the memory usage is further reduced by focusing 
only on flows matching user queries. 

The contributions of this work are as follows. 

• We propose four algorithms for constructing/maintaining 
the concept lattice and its projection substructure (section 
PJ. The projection algorithms are original methods that 
underlie the central task of partitioning the global set of 
flows into disjoint subsets (to be assigned a counter each). 

• We prove that the number of counters established in this 
way is minimal w.r.t. the requirements of: (i) answering 
all active queries, and (ii) assigning a single counter to 
a flow (Theorems [3T0| and |3Tfl in Section [iTi). 

• The FlowME solution can be used for flow-based mea- 
surement in a wide range of network devices and is 
expected, in particular, to enable effective monitoring in 
Openflow switches. Its implementation over an Openflow 
Pizzabox switch largely outperforms a per-flow counter 
assignment at a reasonable computational cost. 

In the remainder of the paper, we first present our lattice 
construction and updating algorithms (section[II]l. In section III 
the mathematical foundations of our solution are summarized 
and its efficiency/minimality are proven. Section [IV] presents 
the major components of our implementation as well as its 
performance evaluation results. We discuss related work in 
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Section [V] and conclude in Section [Vl] 

II. Algorithms 

Following a novel statement of the traffic measurement 
problem, we introduce a set of easy-to-implement algorithms 
for building/maintaining lattices and counter structures and 
illustrate them with an example. The global workflow is illus- 
trated in Figure [2] The lattice building algorithm (a) outputs 
a structure that hierarchically organizes flow entries. Flowset 
partition identification (b) and extraction (c) algorithms find 
optimal groups of flow entries based on user queries. 
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Fig. 2. Lattice based traffic measurement overview 



A. Definitions 

Below, a summary of the notions underlying our approach 
is provided (see [4|, [5| for a complete coverage). 

• A flow entry / is defined by its set of matchfields 
{hi, h 2 , .., h n } and is assigned a counter to be updated 
whenever a packet matches /. Let T be the set of all 
flows installed in a specific switch while F is flowset, 
i.e., an arbitrary set of flows (as in (2]). Let H be the set 
of all matchfield values from T . 

• A user query q e Q is a sequence of regular expressions 
on flow matchfields. Here, we assume a query is merely 
a set of matchfield values. 

. A context /C(J-", T-L, M.) associates F to H via an inci- 
dence relation M. E F x T-L. In K, two image operators ' 
lift A4 to the set level: flow/matchfield sets are mapped 
into the set of incident matchfields/flows (quantification 
is universal). 

. A concept is a pair (F, H), where F e p(F) (extent) and 
H e p(H) (intent) are s.t. F = H' and H = F' . The 



set Cjc of all concepts in K, is partially ordered by extent 
inclusion: 

(F 1 ,H 1 ) ^ K (F 2 ,H 2 ) o F x c F 2 , (H 2 c Hi). 

(C-Kt^k) is a complete lattice, as meets a and joins 
v are defined for arbitrary concept sets. The precedence 
<jc, transitive reduction of sg^, induces the Hasse dia- 
gram of the lattice. 

• Compositions " of complementary images ' are closure 
operators on p(F) and p{Ti), respectively. The families 
of extents, Cj^, and of intents, Cj£, are closed by n. Thus, 
for a set A of flows (of matchfields) A" is the smallest 
extent (intent) comprising A. 

• T is the set of target concepts: for a query q its target is 
7(9) — (q\ o")'> P is tne set of projections, i.e., the meets 
of non-empty set of targets: c p = f\T p , T p E T; G is 
the set of ground projections: for a flow / its ground is 
fi(f) =mm({(F,H)eP\feF}). 

B. Problem statement 

The traffic measurement optimization problem consists in, 
given a set of flows T to monitor and a set of users queries 
Q, finding the minimal partition of T while being able 
to answer all user queries. In this settings, the number of 
partitions in the classical approaches equals the number of 
flows, therefore, traffic measurement resources are maximal 
in all usage contexts. Our running example, K,(F,H, At), is 
shown in Table HJ 
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C. Lattice construction 

Algorithm [T] is a version of NextNeighbor in Q, (p. 35). It 
constructs the lattice from the top concept (F, J- 1 ) down to the 
bottom one (T-L',H), by generating the children of the current 
concept (F,H). To that end, it first produces the extents of 
a larger set of sub-concepts by intersection of F with the 
images of all matchfields outside of H. It then connects as 
children of (F, H) only the maximums of the resulting set (and 
enqueues these for further processing). For instance, at concept 
cs = ({fo, h,h, /si, {hi, h 7 }) in Figure[3] the following four 
extents are generated: (by intersections with h 2 , h' 3 , h' 6 , 
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Fig. 3. Concept lattice of the input context: reduced concept intents/extents 
are provided to increase readability (objects are inherited upwards and 
attributes downwards) 



Algorithm 1: Lattice construction algorithm 

input : K.(F,H,M) 
output: Set of linked concepts G 

1 ConceptQueue <^ {(F,F")}; 

2 while ConceptQueue # do 

3 c = (F C) H c ) 

4 Children <— 0; 

5 foreach h in W. 

6 F h = F c n h'; 

7 if 3c e Children s.t. E e 
L E- c «- F e u {h} ; 

9 else 

10 Children <— Children^ {(Fh, H c u {h})} 



n 

12 



ConceptQueue. pop() ; 
ff r do 

F h then 



c~ <— max(Children); 
C^Cvc: 



and /i' 8 ), {/o,/i} (with h' 4 ), {/i,/ 5 } (with /i' g and /i' 10 ), and 
{/c/4} (with /ig). The latter three are maximal, hence they 
are the extents of children concepts for eg (02, C10, and eg, 
respectively). 

D. Flowset partition identification 

The ultimate goal is to split F into disjoint sets to be 
assigned a single counter each. Assume a query set Q = 
{Qi}i=i..b with qi = {h w }, q 2 = {h 2 ,h 6 ,h 8 }, q 3 = {hi}, 
<74 = {hi, /14, /17} and (75 = {/17}. Given the concept set 
C^; and Q, Algorithm [2] parses C^; to identify T, P and 
G. Projection computation is supported by a bitvector whose 
value for c = (F, H) reflects the queries satisfied by flows in 
F. Formally, the query vector v(c) is an N-bit string indicating 
which qi are matched by H: 



Algorithm 2: Flowset partition identification algorithm 
input : A list of concepts C, 

A set of queries Q = (q u ..,q u ..,q n ) 
output: Target, projection and ground sets (T, P, G) 

1 Sort{C); 

2 foreach c in C do 

3 for qi in Q do 

4 if qi c I c then 

5 «(c)H <— 1; 

6 r^Tujc); Q^Q-{qi}\ 



v(c) «- v(c) U Ucec- U ( C ); 

if |w(c)| > max g e c ~ (|w(c)|) then 
L_ P^Pu{c}; 

if |/ c |=7 then 
for peFdo 

if = u(c) then 
[G^Gu {p}; 6reo/c(); 



10 
11 
12 

13 



TABLE III 
Query vectors values 
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First, the concepts list C is sorted in decreasing order of 
extent sizes (line 1), to ensure the first concept whose intent 
matches a q e Q is its target (line 4). Matched q are removed 
from the list (line 6). In our example, the algorithm outputs 
the targets cq 7 c 5 ,cs 7 C2 and c%, for qi (i = 1..5), respectively. 
Then, the value of v(c) is finalized (line 7): the local part 
(targeted queries, line 5) is merged with the inherited parent 



values (see results in Table IIIi. Projections are concepts whose 
query vectors have more 1 s than any of their respective parent 
ones (line 8). For instance, C10 has three Is, more than its 
parents c$ (one) and cq (two), hence it is a projection (as 
meet of the targets c 6 and c g ). Finally, the ground concept 
of a / e T is the projection with the same query vector as 



v((F,H))[i] 



1, 
0, 



if qi E H 
otherwise 



1< i < N 



the flow concept (/",/) (lines 10-13). Table IV provides the 
flow-to-ground mapping of our example. 

E. Flowset extraction 

The optimal partition is composed by the target concept 
extents: $ g = {F\3(F, H) e G}. A hardware counter is 
assigned to each flowset in $ g , i.e., a total of m = |$ 9 | 
counters. And since ground intents are disjoint, m |.7-"| with 
= reached with exclusively singleton flowsets. With G from 
Table |IV| /g and f<j share a common counter, whereas the 
remaining flows get a dedicated counter each. 
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TABLE IV 
Flow-to-ground concept mapping 
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F. New flow entry insertion 



Algorithm 3: Lattice update : Add a flow 
input : Added flow entry /„ 

input/output: Concept lists (C, T, P. G) 

1 C n <- 0; M «- 0; 

2 foreach c = (E C ,I C ) in C do 

if H = I c then 

L c«-(£7 c u{/ n },J c );M«-Jlf u{c}; 
else if |c e C„ u M s.f. I s = H then 

C„ <- C„ u {c„ = (P c u {/„},#)}; 
c~ <- u {c„}; Cn" <- c„~ u {c}; 
UpdateOrder(c ni c, C„, M); 
UpdateStatus (c„, c, T, P, G); 



11 c p *— Lookup{P,v{n{f n ))\ //Ground of /„ 

12 gi(cp) <- ff(cp) u {/„}; 

13 G *— G u {c p }; 

14 C^Cufe 



Assume a new flow /g with /§ = {/12, /17, is added to 
the initial context. Algorithm [3] implements the schema in Q 
to update the lattice Cjc to the lattice of K n = (J 7 n ,'H,M. n ), 
where T n = T u {/„} and A1„ = u {/„} x The 
basic task consists of producing all intersections of the new 
flow image f n with intents from C^c- For intersections that are 
intents in JC, the extent of the underlying concept (qualified as 
modified) is updated with /„ (line 5). For instance, C12 yields 
Ic 12 n fs = leu, thus its extent is updated with /§ (see the 
updated lattice in Figure ffl. The only other modified is C4. 
Figure |4] presents the updated lattice in Figure [3] Observe that 
concept numbers are IDs: concepts with the same numbers as 
in Figure [3] have the same intents. 

An intersection missing in triggers the creation of a new 
concept (only the first time). The intent of c ra is the intersection 
itself, while the extent is the extent of the generating concept 
(alias the genitor) plus f n (line 7). In our example, cig, C20, 
C21, and C22 are the new concepts with genitors eg, eg, C5, and 
ci, respectively. Among them, C22 is the flow concept of /g. 

To update the <^ links, first c„ and its genitor are linked as 
a parent and a child, respectively (line 8). Then, the children 
of c n are chosen among the already identified part of the new 
and modified concept sets (line 9). 

The UpdateStatus method, detailed by Algorithm |4j estab- 
lishes the status of the new concept c„ (target, projection, 
ground, none) and updates that of its genitor c. In our example, 
C19 matches (75 = {h-j}, thus, q§ will be re-targeted from the 



'-{'■7} 

E = {h,h,h,h,h) 




I — {Al, fog, /14. /lg, hg, hyjlg, /lg, /fifj} 

g = 



Fig. 4. The concept lattice of the extended context: only full intents/extents 
of relevant concepts are drawn. 



Algorithm 4: UpdateStatus: Update support structure 
input/output: concepts c, c„ (the new and its genitor) 
Concept sets (T, P, G) 
1 foreach q t e t(c) do 



if <7i if then 

t(c„) «- t(c„) u {gj; t(c) <- i(c) - {ft}; 



5 if t(c n ) ^ then T^Tu {c„}; 

6 if t(c) = then T <- T - {c}; 

7 if |w(c„)| > max E 6 c - (|u(c)|) then 
s P^Pu{c„}; 

9 if \v(c)\ = \v(c n )\ then 

10 P^P-{c}; 

11 if 5(c n ) then 

12 5(c„) «- ff (c); 5(c) «- 0; 

13 G^Gu {c„} - {c}; 



genitor c 8 to the new concept c 19 (line 3). The projection 
test (line 7) follows the one in Algorithm [2] Next, if the query 
vector of c„ has the same number of Is as the vector of c (line 
7), the latter is no more a projection (line 8). In this case, all 
flows grounded at c are re-grounded at c n (lines 11-13). 

Finally, in a post-processing step (lines 11-13, Algorithm [3]) 
the ground concept of the new flow f n is established as the 
projection c p e P with the same query vector as the flow 
concept fj,(f n ) (line 11). In our example, u(/i(/ 8 )) = w(c 2 2) = 
00001 is the same as the projection C19 query vector. The new 
flow fg is grounded to Ci 9 (line 13). 

To sum up the restructuring: The new target concept list is 
T n = Tu {cig}, the new projections P n = Pu {cig}, and the 
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new grounds G n = Gu {cig}. 

Our solution comprises algorithms for removing of flows as 
well as for adding/removing queries or individual matchfield 
values that we do not provide here for space limitation reasons. 
All our algorithms follow a similar computational schema: 
they perform a traversal of the concept lattice whereas the most 
effort-intensive processing task for a concept boils down to one 
set-based operation on the intents/extents of each parent/child. 
Hence the algorithmic complexity is 0(|Cjc| * * (\H\ + 
\F\)) [6|. Below, we show how the richness and regularity 
of the lattice and its counter-related substructure translate into 
algorithmic efficiency and optimality of the solution. 

III. Proofs 

We establish below the correctness of our algorithms (Prop- 
erties |3 - 1 1 to |3.9| and Theorem |3.10| l and the minimalness of 
the proposed counter assignment (Theorem |3. 11 [ ). 

A. Lattice construction and maintenance 

Algorithm [TJ is rooted in the following results 171; (1) For a 
concept (F,H) with children (Fi,Hi), the faces Hi — H are 
pairwise disjoint, and (2) V/i £ Hi — H, h' n F = -Fj. Hence it 
constructs all (F i} Hi) from (F,H) by producing all possible 
b! n F for matchfields h e H — H and partitioning those h 
into the faces of (F,H). Non face matchfields from % — H 
generate smaller intersections that are filtered out. 

Algorithm [5] (lattice update) transforms C/c into C/c„ with 
the emphasis on the completion of the intent family CjJ~ to 
C/c (since H n = H). Indeed, existing intents provably remain 
valid in JC n : Cjl c C™. . Moreover, the new intents are pairwise 
intersections of f' n with elements from C^: 

Property 3.1: C^ n = C£ u {{/„}' nH\He C£}. 
Correspondingly, the extents of concepts in C/c n have only two 
possible forms: F or F u {f a } where F e c£-. 

A new intent generates a new concept: Although it may be 
produced more than once, a canonical generator, the genitor, 
exists that holds a two-fold bound to the new concept, i.e., 
both through its intent and extent: 

Property 3.2: For a (F, H) e C Kn , s.t. H e C£ - 
3(F g , H g ) e C K s.t. H = H g nf n and F = F g u {/„"}. 
As a corollary, the genitor intent is the closure of the new one 
in JC: H" = H g . 

Modified concept intents H are provably s.t. H c {/„}' 
(closed in JC). Hence in JC n the respective extents H' comprise 
/„. Observe that genitor and modified have intents that are 
the closures of their H Q {f n }'< hence they are the maximal 
concepts to produce it. Thus, they will be the first ones to 
reach along the top-down breadth-first traversal of the lattice. 
Finally, the adjustment of the new precedence among concepts 
in C/c„ is skipped here (interested readers are directed to J7J). 

B. Measurement support construction 

Observe 7(g) is the maximal (F, H) e C/c s.t. q Q H while 
F is the set of all flows / satisfying q (q E /')■ Moreover, 
p(f) is well defined: it is the meet of the targets of queries 
satisfied by / (p(f) = AHq)\q eft«<= /'})• 



The main tasks in Algorithm [2] are detecting all 7(5) (the 
highest concept (F, H) with q c H) and propagating the 
targeted q downwards in the lattice. These q are stored in the 
bitvectors v() for further projection tests. Now, a projection c 
is exactly the meet of the targets of queries in v(c): 

Property 3.3: c e P iff c = /\{j(qi) | v(c)[i] = 1}. 
As a corollary, the projection extent is the intersection of the 
target ones (F = n{-^7ta)l u ( c )H = !})■ Then, c is maximal 
for v(c) and thus can be recognized by comparing its bitvector 
to those of parent concepts: 

Property 3.4: c 6 P iff Vc 6 cf , v(c) v(c). 
As it is readily shown that as a function v() is monotonously 
non increasing w.r.t. ^, the property may be recast in terms 
of cardinalities: \v(c)\ > max eec -(|v(c)|). 

Finally, G is tested by comparing v(c) to bitvectors of flow 
concepts: 

Property 3.5: For a c = (F,H), c e G iff 3/ e F s.t. for 

c =(/",/'), v(c)=v(c). 

Moreover, as in our specific case, no flow has a subset of 
another flow's matchfields, V/ e F, f" = f. Thus flow 
concepts are exactly those with singleton extents. 

C. Measurement support maintenance 

To show that T, P and G are correctly transformed into 
T n , P n and C n , respectively, by Algorithm|4] observe that for 
c e C/c v(c) keeps its value in C/c n . 

Property 3.6: For a concepts c = (F, H) e Cytc„, if H e C 1 ^ 
then v n (c) = v(c) where c = (H',H). 

The reason is v(c) only depends on H and Q which remain 
stable in JC n . Thus, the function v n Q evolves from v() by 
merely computing the values for new concepts in C n (value 
propagation matches the downward generation of C n ). 

Now, T n may depart from T as some q e Q may change 
targets (7(g) 7n(<z))- Clearly, j n (q) can only be a new 
concept in lC n , whereas 7(g) is its genitor in JC. 

Property 3.7: For a query q e Q s.t. 7(g) ^ J n (q), c = 
7„(g) is a new concept in Cjc n while c = 7(g) is its genitor. 

This follows from the minimalness of H" among intents 
comprising H. Thus, with c = (F,H) and c = (F n ,H n ), we 
show that H = H" (hence the genitor status) in JC. Indeed, 
assuming H # H'^, we deduce H" c H (*) since q <= H 
(recall q" = H) and H n is the closure of q in JC n (minimal 
intent comprising q). Yet since q c H n a H", (*) would 
contradict the minimalness of H = q" in JC. 

P n evolves from P along two separate scenarios: (1) as 
with T n , a new concept may become projection by eclipsing its 
genitor in P n ; and (2) a new concept may become the infimum 
for a set of query targets with no equivalent in P. Recall that 



projections are identified within P by vQ (Property 3.3 1. In 
other terms, in case one, the infimum c of a set of targets 
(v(c)) evolves to a different concept c (diverging intents) with 
the same bitvector value (v(c) = v n (c)), whereas in case two, 
a previously nonexistent set of targets v n (c) arises. 

Property 3.8: Given a c e P n , if c is not the equivalent of 
the projection concept c = /\{j(qi) | fn(c)[i] = 1} in JC, 
then c is a new concept with c as its genitor. 



6 



Assume that for some c = (F, H) e P n , the projection 
c = A{l(<li)\vn(c)[i] = 1} from P is such that c = (F a , H D ) 
and _ff G ^ H (c not an equivalent concept in K,, despite 
v(c) = v(c)). The latter means F F a , and since these are 
the intersection of the target extents from v(c), it follows that 
all those extents have changed. As we saw previously, the only 
possible evolution of a target extent for a query q in K, n is to 
increase by /„. Consequently, their intersection F (corollary 
of Property |3. 3 [ ) comprises /„ as well and, as H is not an intent 
in JC, c is a new concept. By the same argument, F a can only 
be F a = F — {/„}, hence c is the genitor of c (Property 3.2 1. 

In case two, the new projection c is a new concept too: 

Property 3.9: Given a c = (F,H) e P n , if for all projec- 
tions ce P, v n (c) v(c), then c is a new concept. 

Assuming the opposite, let H e C^, hence c is not a new 
concept (/„ £ F) and thus v n (c) = v(c). Consequently there 
is a concept with the same bitvector value in K,, c itself, which 
further means there must be a maximal concept c, s.t. v(c) = 
v(c), i.e., an infimum. This contradicts the starting hypothesis. 

G n being a subset of P n , similar evolution patterns hold: In 
the above case one, all flows grounded at the genitor -which 
vanishes from P n , hence from G n - must be re-grounded at 
the new projection c„. In case two, no flow from F could be 
grounded in c„, since their flow concepts in C/c n have intents 
from Cfc. Thus, the respective bitvectors do not change in JC n , 
hence all such flows are grounded in c whose v n (c) existed in 
P. Therefore, /„ is the only candidate for case two grounds. 

To sum up, in T n , new concepts of target genitors grab 
targeted queries comprised in their respective intents, while 
genitors with no remaining queries vanish. In P n , new con- 
cepts are tested for projection and, if positive, genitors too. 
In G n , flows grounded at a shifting projection move from the 
genitor to the new concept. Finally, /i(/„) is found. 

D. Correctness and minimalness of counter assignment 

We prove that ground concept-based counter assignment is: 
(1) correct, and (2) of minimal cardinality. Recall that each 
ground c g e G is assigned a counter whose support is the set 
of grounded flows denoted g(c g ) = = c g }. This is a 

unique counter assignment (uniqueness of and w.l.o.g. 

we assume that each flow is grounded. Furthermore, for each 
q e Q the set of relevant counters compose to a sum and let 
the underlying total set of flows be S(q). As a counter enters 
a query sum iff its ground is below the corresponding target, 
we have V/ e F, q e Q, f e S(q) iff < 7 («). 

Correctness means a S(q) is the set of flows satisfying q: 

Theorem 3.10: Vg e Q, f e F, q E /' iff < 7(g). 

'If: follows from intent inclusion along sg: q c J,^ c 
Iu(f) — /'■ 'Only if: Observe that satisfaction means / e 
E^(g\ and assume, by reductio ad absurdi, fi(f) ^ 7(g)- Then 
the infimum c q j = ^(/) a 7(g) e P (as a is associative) 
whereas / e E Cq ,. Yet this contradicts ^ 7(g) as then 
c q j < n(f) (minimal in P to hold /). 

Conversely, redundancy in S(q) is excluded since a relevant 
flow / appears exactly once in it (through 



Minimalness means no unique counter assignment among a 
smaller set of counters could answer all q in Q. We focus on 
the underlying partition of T: 

Theorem 3.11: Let cpt : T — > p(F) with cpt(f) = F iff 
/ e F and assume |ran(cpt)| < \G\. Then 3q e Q s.t. S(q) is 
not decomposable into the union of some sets from ran (cpt)). 

By reductio ad absurdi, assume all S(q) represent unions of 
cpt(f) for flows / from a well-chosen set. A straightforward 
combinatorial argument yields 3/i, f% e F, s.t. fi(fi) ¥= A*(/2) 
(**) yet cpt(fi) = cpt(f 2 ). Yet (**) means v((i(fi)) j= 
u (m(/2)) an d w.l.o.g. we can assume 3q a e Q, s.t. q *— f[ 
but q a f 2 . However, this contradicts the initial hypothesis 
since there is no way to correctly decompose S(q a ) into a 
union of cpt(f): if cpt(fi) participates, then there is no way 
to remove the contribution of f 2 (subtraction not available) 
while otherwise, there is no way to recover the contribution 
°f fi (cpt(fi) is its unique counter). 

IV. Implementation and Results 

A FlowME implementation over an OpenFlow switch was 
studied along three measurement axes: (1) memory cost ex- 
pressed in term of number of managed counters, (2) processing 
effort for concept lattice generation/update, and (3) perfor- 
mance overhead on packet processing. We choose OpenFlow 
because flow entries and user queries can be pushed and 
retrieved from the Openflow tables of the switch. Experimental 
results show a huge reduction in the number of hardware 
counters with a reasonable overall computational effort and 
negligible interference on traffic. 

A. Testbed design 

The FlowME testbed comprises an OpenFlow Switch with 
per-flow counter support, a flow entry generator, a Collector 
and user applications that generate queries. As shown in 
Figure [5] FLowME Collector gets the set of flow entries F 
(a) installed in the Flow table of the OpenFlow switch and 
user queries Q (b). It calls upon lattice algorithms of the 
Coron FCA suite |8| to calculate/maintain the optimal flow 
entry partition and exploits it to place flow counter references 

(c) . Next, traffic matching F increments hardware counters 

(d) . FlowME collector reads counter values (e), calculates 
query answers and sends them to user applications (f). We 
experimented FlowME with a variety of flow entry and query 
distributions. 

1) Flow entry benchmarking: Flow entry benchmark gen- 
erates up to 12 fields per entries. It is based on Flexible 
Rule Generator [9|, a user controlled benchmarking tool for 
evaluating packet forwarding algorithms that generate sets 
of OpenFlow flow entries based on predefined matchfield 
distributions. We extract matchfield distributions from packet 
traces provided by packetlife.net. The traces are particularly 
interesting since packet headers contain different types of 
fields as MAC, VLAN, IP and transport fields. As a result, 
a total of 12 standard OpenFlow matchfields are used in the 
benchmark. We analyse packet trace headers to determine a 
distribution function for each matchfield. Table [VH shows a 
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Fig. 5. FlowME testbed 

set of matchfields and their value distributions. Notice that IP 
source and destination analysis involves the prefix distribution 
and the prefix length distribution. 

TABLE V 

Packet trace matchfield value distribution of density 3= 3% 



Matchfield 


Distribution 


MAC src 


00:40:05(39%), 08:00:07(13%), 00:60:08(19%) 


MAC dst 


00:60:08(33%), FF:FF:FF(37%), 00:40:05(19%) 


Ethertype 


0x8100(98)% 


VLAN id 


32(56%), 104(17%), 108(4%), 6(6%) 


IP protocol 


0x06(80%), 0x11(6%), 0x01(13%) 


TOS 


0(96%), 192(3%) 


L4 src port 


2212(41%, 1815(26%), 2388(11%), 8(4%) 


L4 dst port 


1815(53%), 2212(18%), 2388(8%), 3314(4%) 



2) Query benchmarking: The second benchmark generates 
application queries. A query covers a set of flow entries whose 
size depends on how many matchfields get a non wildcard 
value. In our experimental study, we generate user queries with 
the same matchfield value distribution as flow entries, in which 
we inserted some wildcarded values (a specific percentage for 
each matchfield). Moreover, we force each query to cover at 
least one flow entry. To that end, we first extract n flow entries 
from T and then insert a specific percentage of wildcards in 
each matchfield, thus yielding a set of n queries. 

B. Switch implementation 

We use the 100 Gig OpenFlow implementation over 
EZchip's network processor introduced in [10|, and we add 
flow counter support. In our solution, each flow in memory 
is split across a set of pipelined tables (see Figure [6j. Each 
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Fig. 6. Implementation over an OpenFlow switch 



table is implemented using TCAM for flow matchfield classi- 
fication and Hash table for flow entry instruction and counter 
reference storage. Flow tables report counters in a continuous 
memory space where SRAM supports first 8192 counters and 
RLDRAM supports the rest. In a typical scenario, as the one 
shown in Figure [5] and [6] when a packet is received, relevant 
header fields are parsed by a Parse engine, and a key is built. A 
lookup is performed by a Search engine in the TCAM against 
flow entry matchfields, and if they match, the TCAM provides 
an index of the matching flow entry. In order to retrieve 
counters and instructions associated to that entry, a second 
lookup is performed in Hash table based on the entry index. A 
Resolve engine receives the entry and processes its hardware 
counter reference and instructions (prepared for an eventual 
execution). The processing repeats on subsequent flow tables. 
At the very end of flow identification, Resolve engine sends a 
hardware counter increment command to the Statistics Block 
via a dedicated routine. 

C. Memory cost 

Per-flow traffic measurement tools manage an individual 
counter for each traffic flow processed by the system (the 
set J 7 ) and report individual flow statistics to a centralized 
collector. In those systems, the counter number N c evolves 
linearly with {J 7 ] since, in practice, for each flow /e J, the 
system may need different traffic metrics, e.g. the number of 
matching packets or their total size. In contrast, our solution 
relies on aggregated counters, so we ran FlowME with the 
above benchmarks and observed the N c value. The evolution 
of the number of counters to maintain in order to answer a 
set of user queries Q of size Nq is depicted in Figures [7] 
[8] and [9] In Figure [7] query field values are composed of 
10% exact match values and 90% wild-cards. In Figures [8] 
and [9] queries are more specific with 50% and 90% exact 
match values, respectively. For example, the first experiment 
shows that for Nq = 1000, N c is significantly lower than 
the 10000 per-flow counters of the base-line solution (949 
to 3555, depending on wild-card distribution). As a general 
trend, we observe that as less specific queries cover more flow 
entries each, the number of (minimal) intersections is higher 
and thus the counter set grows larger. Let M be the available 
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Fig. 7. Number of managed counters for varying query set sizes. Query field 
value distribution: 10% exact-match 90% wildcard 



Fig. 9. Number of managed counters for varying query set sizes. Query field 
value distribution: 90% exact-match 10% wildcard 
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memory size reserved for traffic measurement, M u the set of 
managed counter registers in a specific measurement period. 
M u is bounded by the number of flow entries installed in 
the switch. In traditional flow-based measurement schemes, 
M u = \J-\. In contrast, FlowME is an application-aware 
measurement tool, hence \F\ is the worst case that occurs 
only if the ground intersections generated by Q split T into 



its singleton components. Table VI illustrates relative memory 



consumption for our solution in experiment one (Figure [7] with 
N Q = 1000). 

TABLE VI 

Comparison in memory usage (Nq = 1000) 



Technique 


# of Counters 


SRAM usage 


Per-Flow 


8192 


100% 


FlowME 


3555 


43% 



D. Lattice structure generation time 

The flexibility of FlowME is assessed by measuring the 
update time for a new flow entry. Two main operations 
are monitored: (1) lattice update and (2) ground concept 
identification and flow partition extraction. In our experiment, 
10000 flow entries are split into groups of 100 to be sent for 
lattice update at subsequent steps of the incremental process. 



in the experimentation period. For instance, in the first 8s, 1000 
flows are added. Overall, it took a total of 5min30s to build 
the lattice and identify flow entry partitions incrementally. 




1 000 queries 90% wiirj-cai 
500 queries 90%wiid-c 
100 queries 90%wiidt 

1 000 queries 50% wiidi 
500 queries 50% wiid-i 
100 queries 50% wiid-i 
10000 queries 90% wild-( 



200 

time (s) 



Figure 10 shows the number of flow entries added at each step 



Fig. 10. Structure incrementation time: A group of 100 flow entries is added 
to the structure at each iteration 



E. Packet processing performance 

The main challenge in traffic measurement is to minimize 
the impact on packet processing performance (time in clock 
cycles). In our testbed using a 400MHz network processor, 
the average total packet processing time is 2447ns. Since 
FlowME focuses only on flows covered by user queries, the 
remaining flows are automatically accelerated. As indicated 
above, we implemented a routine for the packet resolving stage 
that increments flow counters based on the reference retrieved 
at the searching stage. Processing time is now measured by 
placing start/end timestamps around the routine. According to 
the experimental outcome, the increment routine is performed 
in 9 clock cycles. Hence, for an unmanaged flow packet, 
FlowME lowers the total processing time by 22.5ns. 

V. Related work 

The closest approach from the literature is the ProgME |2| 
traffic measurement tool. ProgME reflects application require- 
ments through a rich query language where n, u and \ 
are used to compose queries from simpler ones. To answer 
a set of queries, ProgME decomposes them into a set of 



9 



disjoints flowsets, and assigns a counter to each one. In 
that, it does not rely on predefined flows. In contrast, our 
flows, and queries for that matters, are defined as simple 
conjunctions of matchfield values. However, the same set- 
theoretic operators on the answer sets of flows corresponding 
to queries are successfully simulated by our ground projection- 
based partition. In particular, the set of flows grounded in 
a projection represent the set-theoretic difference between 
the projection extent and the union of all extents of smaller 
projections. As a result, our own partition of T comprises a 
local partition of each answer set of flow entries. Now, our 
main advantage over ProgME lays in the (proven) minimality 
of counter-assignment solution: The lattice structure ensures 
none of the implicit set operations is redundant whereas the 
disentangling algorithm in ProgME lacks such a result. 

AutoFocus ifTD is a tool for offline hierarchical traffic 
analysis whose goal is complementary to ours. Like ProgME, 
it doesn't use predefined flows but rather discovers them. To 
that end, it mines hierarchies of frequent generalized values 
for each matchfield and combines them into a global multi- 
dimensional structure. The structure comprises both the most 
significant and some deviant flows. As the authors themselves 
admit, the approach boils down to mining frequent generalized 
patterns on multiple dimensions. In comparison, our lattice 
contains the frequent closed patterns of matchfield values from 
T which is a strict subset of all frequent patterns Q. 

Current flow-based monitoring and collection systems like 
Cisco Netflow, FlowScan 021 and sFlow [13] track all flow 
statistics continuously at a specific sampling rate. This gener- 
ates a large number of transactions and a management band- 
width usage proportional to the number of flows, regardless 
of real application needs. The comparison of FlowME to a 



flow-based measurement technique (section IV-C i shows the 



huge reduction in the number of managed flows. 

Finally, since the introduction of metered traffic groups by 
the ISO accounting model 1 14 1 a number of architectures 
based on that notion were proposed in IETF internet RFCs 
(e.g. 1151 ). Identification of traffic groups remains an open 
problem and is typically solved by network operations per- 
sonnel. FlowME is a significant step toward its automation. 

For an in-depth coverage of the traffic measurement field 
readers are referred to 0. 

VI. Conclusion 

We presented FlowME, a lattice-based traffic measurement 
solution that, we believe, is a significant contribution to 
the field. Its mathematically founded approach amounts to 
partitioning the set of flow entries into a minimal number of 
subsets, each assigned a hardware counter. The main advan- 
tages thereof, efficiency in statistic computation and optimal 
resource usage, have been experimentally confirmed through 
an implementation over an OpenFlow switch: The results 
show both a significant reduction in the number of hardware 
counters (up to a factor of 10) and excellent performances. 
Moreover, our algorithmic methods are easy to implement 
while highly flexible and adaptable to a wide range of contexts. 



Within a broader scope, a crucial benefit of our solution is 
its predictability: A preliminary assessment of the resources 
required by a user request is enabled in order to ensure their 
consumption respects the acceptable limits, i.e., 10%. Overall, 
our approach will enable user control over the set of statistics 
in OpenFlow 1.3 [16| instead of fixing them within the stan- 
dard. Furthermore, the genericity of the mathematical solution 
makes it particularly suitable to future network protocols with 
open sets of packet fields (CDN 03, NDN 03), etc.). 

Finally, the high versatility of the lattice-based framework 
enables large variations in the problem settings. For instance, 
in the shorter term, we shall investigate the reduced substruc- 
tures of the lattice as support for the counter assignment. 
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